Speech decoder that detects stationary noise signal regions

ABSTRACT

A first determiner  121  tentatively determines whether the current processing unit represents a stationary noise period, based on stationary properties of a decoded signal. Based on the tentative determination result and a determination result of the periodicity of the decoded signal, a second determiner  124  determines whether the current processing unit represents a stationary noise period, thereby distinguishing a decoded signal including a stationary speech signal such as a stationary vowel from stationary noise and correctly identifying the stationary noise period.

TECHNICAL FIELD

The present invention relates to a speech decoding apparatus thatdecodes speech signals encoded at low bit rates in a mobilecommunication system and packet communication system (e.g. internetcommunication system). More particularly, the present invention relatesto a CELP (Code Excited Linear Prediction) speech decoding apparatusthat divides speech signals into the spectrum envelope component and theresidual component.

BACKGROUND ART

In mobile communications, packet communications (e.g., internetcommunications) or speech storage, speech coding apparatuses are usedfor compressing speech information by using efficient encoding. This isfor effective use of the capacity of transmission layer resources likeradio frequencies or the capacity of storage media. Among those, systemsbased on the CELP (Code Excited Linear Prediction) system are carriedinto practice widely at medium and low bit rates. Techniques of CELP aredescribed in M. R. Schroeder and B. S. Atal: “Code-Excited LinearPrediction (CELP): High-quality Speech at Very Low Bit Rates”, Proc.ICASSP-85, 25.1.1, pages 937-940, 1985.

According to the CELP speech coding system, speech is divided intoframes of a certain length (about 5 ms to 50 ms), linear predictionanalysis is performed for each frame, and the prediction residual (i.e.excitation signal) from the linear prediction analysis is encoded usingan adaptive code vector and a fixed code vector having the shapes ofprescribed waveforms. The adaptive code vector is selected from anadaptive codebook that stores excitation vectors produced earlier. Thefixed code vector is selected from a fixed codebook that stores aprescribed number of vectors of prescribed shapes. The fixed codevectors stored in the fixed codebook include random vectors and vectorsproduced by combining several pulses.

A prior-art CELP coding apparatus performs LPC (Liner PredictiveCoefficient) analysis and quantization, pitch search, fixed codebooksearch and gain codebook search, using input digital signals, andtransmits the LPC code (L), pitch period (A), fixed codebook index (F)and gain codebook index (G), to the decoding apparatus.

The decoding apparatus decodes the LPC code (L), pitch period (A), fixedcodebook index (F) and gain codebook index (G), and, based on thedecoding results, applies an excitation signal to a synthesis filter andproduces the decoded signal.

However, with the prior-art speech decoding apparatus, it is difficultto distinguish signals that are stationary but are not noisy (e.g.stationary vowels) from stationary noise and identify a stationary noiseperiod.

DISCLOSURE OF INVENTION

It is therefore an object of the present invention to provide a speechdecoding apparatus that correctly identifies the stationary noise signalperiod and decodes speech signals. To be more specific, it is an objectof the present invention to provide a speech decoding apparatus andspeech decoding method for identifying the speech period and thenon-speech period, distinguishing periodic stationary signals fromstationary noise signals (e.g. white noise) using the pitch period andadaptive code gain, and correctly identifying the stationary noisesignal period.

To achieve the object, the present invention proposes an apparatus andmethod for tentatively evaluating the properties of stationary noise ofa decoded signal, determining whether the current processing unitrepresents a stationary noise period based on the tentatively evaluatedstationary noise properties and the periodicity of the decoded signal,separating the decoded signal containing stationary speech signal suchas stationary vowels from stationary noise, and correctly identifyingthe stationary noise period.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a diagram showing a configuration of a stationary noise periodidentifying apparatus according to a first embodiment of the presentinvention;

FIG. 2 is a flowchart showing procedures of grouping of pitch history;

FIG. 3 is a diagram showing part of the flow of mode selection;

FIG. 4 is another diagram showing part of the flow of mode selection;

FIG. 5 is a diagram showing a configuration of a stationary noisepost-processing apparatus according to a second embodiment of thepresent invention;

FIG. 6 is a diagram showing a configuration of a stationary noisepost-processing apparatus according to a third embodiment of the presentinvention;

FIG. 7 is a diagram showing a speech decoding processing systemaccording to a fourth embodiment of the present invention;

FIG. 8 is a flowchart showing the flow of the speech decoding system;

FIG. 9 is a diagram showing examples of memories provided in the speechdecoding system and of initial values of the memories;

FIG. 10 is a diagram showing the flow of mode determination processing;

FIG. 11 is a diagram showing the flow of stationary noise additionprocessing; and

FIG. 12 is a diagram showing the flow of scaling.

BEST MODE FOR CARRYING OUT THE INVENTION

Embodiments of the present invention will be described below withreference to the accompanying drawings.

First Embodiment

FIG. 1 illustrates a configuration of a stationary noise periodidentifying apparatus according to the first embodiment of the presentinvention.

Given a digital signal input, an encoder (not shown) first performs ananalysis and quantization of Linear Prediction Coefficients (LPC), pitchsearch, fixed codebook search and gain codebook search, and thentransmits the LPC code (L), pitch period (A), fixed codebook index (F)and gain codebook index (G).

A code receiving apparatus 100 receives the encoded signal transmittedfrom the encoder, and separates the code L representing the LPC, a codeA representing an adaptive code vector, code G representing gaininformation and code F representing a fixed code vector, from thereceived encoded signal. The code L, code A, code G and code F areoutput to a speech decoding apparatus 101. TO be more specific, the codeL is output to an LPC decoder 110, code A is output to an adaptivecodebook 111, code G is output to a gain codebook 112, and code F isoutput to a fixed codebook 113.

Speech decoding apparatus 101 will be described first.

LPC decoder 110 decodes the LPC from the code L and outputs the decodedLPC to a synthesis filter 117. LPC decoder 110 converts the decoded LPCsinto an Line Spectrum Pair (LSP) parameter for better interpolationproperty, and outputs this LSPs to an inter-subframe variationcalculator 119, distance calculator 120 and average LSP calculator 125,which are provided in a stationary noise period detecting apparatus 102.

In general, the code L is an encoded version of the LSPs, and, in thiscase, LPC decoder 110 decodes the LSPs and then converts the decodedLSPs to LPCs. The LSP parameter is an example of spectrum envelopeparameters representing the spectrum envelope component of a speechsignal. Other examples include the PARCOR coefficients and the LPCs.

Adaptive codebook 111 provided in speech decoding apparatus 101regularly updates excitation signals produced earlier and stores thesesignals, and produces an adaptive code vector using the adaptivecodebook index (i.e. pitch period (pitch lag)) obtained by decoding thecode A. The adaptive code vector produced in adaptive codebook 111 ismultiplied by an adaptive code gain in an adaptive code gain multiplier114, and the result is output to an adder 116. The pitch period obtainedin adaptive codebook 111 is output to a pitch history analyzer 122provided in stationary noise period detecting apparatus 102.

Gain codebook 112 stores a predetermined number of sets of adaptivecodebook gains and fixed codebook gains (i.e. gain vectors), outputs theadaptive codebook gain component (i.e. adaptive code gain) of the gainvector, specified by the gain codebook index obtained by decoding thecode G, to adaptive code gain multiplier 114 and a second determiner124, and outputs the fixed codebook gain component (i.e. fixed codegain) of the gain vector, to a fixed code gain multiplier 115.

Fixed codebook 113 stores a predetermined number of fixed code vectorsof different shapes, and outputs a fixed code vector specified by afixed codebook index obtained by decoding the code F to fixed code gainmultiplier 115. Fixed code gain multiplier 115 multiplies the fixed codevector by the fixed code gain and outputs the result to adder 116.

Adder 116 adds the adaptive code vector from adaptive code gainmultiplier 114 and the fixed code vector from fixed code gain multiplier115 to produce an excitation signal for a synthesis filter 117, andoutputs the excitation signal to synthesis filter 117 and adaptivecodebook 111.

Synthesis filter 117 configures an LPC synthesis filter using the LPCsfrom LPC decoder 110. Synthesis filter 117 performs filtering process ofthe excitation signal from adder 116, synthesizes the decoded speechsignal and outputs the synthesized decoded speech signal to apost-filter 118.

Post-filter 118 performs the processing (e.g. formant enhancement andpitch enhancement) for improving the subjective quality of the signalsynthesized by synthesis filter 117, and outputs the result as apost-filter output signal of speech decoding apparatus 101, to a powervariation calculator 123 provided in stationary noise period detectingapparatus 102.

The above-described decoding by speech decoding apparatus 101 is carriedout for every processing unit of a predetermined period (that is, forevery frame of a few tens of milliseconds) or for every shorterprocessing unit (i.e. subframe). Cases will be described below wheredecoding is carried out on a per subframe basis.

Stationary noise period detecting apparatus 102 will be described below.A first stationary noise period detector 103 provided in stationarynoise period detecting apparatus 102 will be explained first. Firststationary noise period detector 103 and second stationary noise perioddetector 104 perform mode selection and determine whether the targetsubframe represents a stationary noise period or a speech signal period.

The LSPs from LPC decoder 110 are output to first stationary noiseperiod detector 103 and stationary noise property extractor 105 providedin stationary noise period detecting apparatus 102. The LSPs input tofirst stationary noise period detector 103 are input to aninter-subframe variation calculator 119 and a distance calculator 120.

Inter-subframe variation calculator 119 calculates how much the LSPshave changed from the immediately preceding subframe. Specifically,based on the LSPs from LPC decoder 110, inter-subframe variationcalculator 119 calculates the difference between the LSPs of the currentsubframe and the LSPs of the preceding subframe for each order, andoutputs the sum of the squares of the differences, as the amount ofinter-subframe variation, to a first determiner 121 and a seconddeterminer 124.

In addition, it is preferable to use a smoothed version of the LSPs forcalculating the amount of the variation so that the influence ofquantization error fluctuations is minimized. Excessive smoothing is tobe avoided, since it may result in poor responsiveness to variationsbetween subframes. For example, to smooth the LSP as shown in equation1, it is preferable to set the value of k at about 0.7.Smoothed LSPs [current subframe]=k×LSPs+(1−k)×smoothed LSPs [precedingsubframe]  (Equation 1)

Distance calculator 120 calculates the distance between the average LSPsin earlier stationary noise periods from an average LSP calculator 125and the LSPs of the current subframe from LPC decoder 110, and outputsthe calculation result to first determiner 121. For the distance betweenthe average LSPs and the LSPs of the current subframe, for example,distance calculator 120 calculates the difference between the averageLSPs from average LSP calculator 125 and the LSPs of the currentsubframe from LPC decoder 110, for each order, and outputs the sum ofthe squares of the differences. Distance calculator 120 may output thesum of the square of the LSP differences calculated for each order, andmay output, in addition, the LSP differences themselves. In addition tothese values, distance calculator 120 may output the maximum value ofthe LSP differences. Thus, by outputting various measures of thedistance to first determiner 121, it is possible to improve thereliability of determination in first determiner 121.

Based on the information from inter-subframe variation calculator 119and distance calculator 120, first determiner 121 evaluates the degreeof LSP variation between subframes and the similarity (i.e. distance)between the LSPs of the current subframe and the average LSPs of thestationary noise period. More specifically, these are determined usingthresholds. If the LSP variation between subframes is small and the LSPsof the current subframe are similar to the average LSPs of thestationary noise period (that is, if the distance is small), the currentsubframe is determined to represent a stationary noise period, and thisdetermination result (i.e. first determination result) is output tosecond determiner 124.

In this way, first determiner 121 tentatively determines whether thecurrent subframe represents a stationary noise period, by firstevaluating the stationary properties of the current subframe based onthe amount of LSP variation between the preceding sub frame and thecurrent subframe, and by further evaluating the noise properties of thecurrent subframe based on the distance between the average LSPs and theLSPs of the current subframe.

However, evaluation based solely on the LSPs may result in, for example,misidentification of a periodic stationary signal such as a stationaryvowel or sine wave, as a noise signal. Therefore, second determiner 124provided in second stationary noise period detector 104 described belowanalyzes the periodicity of the current subframe, and, based on theanalysis result, determines whether the current subframe represents astationary noise period. That is to say, since a signal having a strongperiodicity is likely to be a stationary vowel or the like (not noise),second determiner 124 determines that the signal does not represent astationary noise period.

Second stationary noise period detector 104 will be described below.

A pitch history analyzer 122 analyzes the fluctuations of pitch periods,which is input from the adaptive codebook, between subframes.Specifically, pitch history analyzer 122 temporarily stores the pitchperiods of a predetermined number of subframes (e.g. ten subframes) fromadaptive codebook 111, and groups these pitch periods (i.e. the pitchperiods of the last ten subframes including the current subframe) by themethod shown in FIG. 2.

The grouping will be described using as an example a case of groupingthe pitch periods of the last ten subframes including the currentsubframe. FIG. 2 is a flow chart showing the steps of the grouping.First, in ST1001, the pitch periods are classified. More specifically,pitch periods with the same value are sorted into the same class. Thatis, pitch periods having exactly the same value are sorted into the sameclass, while pitch periods having even slightly different values aresorted into different classes.

Next, in ST1002, classes having close pitch period values are groupedinto one group. For example, pitch periods between which the differenceis within 1, are sorted into one group. In this grouping, if there arefive classes where the difference between pitch periods is within 1(e.g. there are classes for the pitch periods of 30, 31, 32, 33 and 34),these five classes may be grouped as one group.

In ST1003, as a result of the grouping, an analysis result showing thenumber of groups into which the pitch periods of the last ten subframesincluding the current subframe are classified, is output. The less thenumber of groups shown in the result of analysis (minimum one), the morelikely the decoded speech signal is periodic. On the other hand, thegreater the number of groups, the less likely the decoded speech signalis periodic. Accordingly, if the decoded speech signal is stationary, itis possible to use the result of this analysis as a parameterrepresenting periodic stationary signal properties (i.e. the periodicityof stationary signal).

A power variation calculator 123 receives, as input, the post-filteroutput signal from post filter 118 and average power information of thestationary noise period from an average noise power calculator 126.Power variation calculator 123 calculates the power of the output signalof post filter 118, and calculates the ratio of the power of thepost-filter output signal to the average power of the signal in thestationary noise period. This power ratio is output to second determiner124 and average noise power calculator 126. Power information of thepost-filter output signal is also output to average noise powercalculator 126. If the power (i.e. current signal power) of the outputsignal of post filter 118 is greater than the average power of thesignal in the stationary noise period, there is a possibility that thecurrent subframe contains a speech period. The average power of thesignal in the stationary noise period and the power of the output signalof post filter 118 are used as parameters to detect, for example, theonset of speech that cannot be identified using other parameters.Instead of calculating and using the ratio of the power of thepost-filter output signal to the average power of the signal in thestationary noise period, power variation calculator 123 may calculateand use the difference between these powers as a parameter.

As described above, the output of pitch history analyzer 122 (i.e.information showing the number of groups into which earlier pitchperiods are classified) and the adaptive code gain from gain codebook112 are input to second determiner 124. Using these information, seconddeterminer 124 evaluates the periodicity of the post-filter outputsignal. In addition, the following information are input to seconddeterminer 124; the first determination result from first determiner121, the ratio of the power of the signal in the current subframe to theaverage power of the signal in the stationary noise period from powervariation calculator 123, and the amount of inter-subframe LSP variationfrom inter-subframe variation calculator 119. Based on these informationand the determination result of the periodicity, second determiner 124determines whether the current subframe represents a stationary noiseperiod, and outputs this determination result to subsequent processingapparatus. The determination result is also output to average LSPcalculator 125 and average noise power calculator 126. In addition, anyof three apparatuses; code receiving apparatus 100, speech decodingapparatus 101 and stationary noise period detecting apparatus 102, mayhave a decoder that decodes information, which is contained in areceived code, showing the presence or absence of a voiced stationarysignal and outputs the decode information to second determiner 124.

Stationary noise property extractor 105 will be described below.

Average LSP calculator 125 receives, as input, the determination resultfrom second determiner 124 and the LSPs of the current subframe fromspeech decoding apparatus 101 (more specifically, from LPC decoder 110).If the determination result provided by second determiner 124 indicatesa stationary noise period, average LSP calculator 125 recalculates theaverage LSPs in the stationary noise period using the LSPs of thecurrent subframe. The average LSPs are recalculated using, for example,an autoregressive model smoothing algorithm. The recalculated averageLSPs are output to distance calculator 120.

Average noise power calculator 126 receives, as input, the determinationresult from second determiner 124, and the power of the post-filteroutput signal and the ratio of the power of the post-filter outputsignal to the average power of the signal in the stationary noiseperiod, from power variation calculator 123. If the determination resultfrom second determiner 124 shows a stationary noise period, or if thedetermination result does not indicate a stationary noise period yetnevertheless the power ratio is less than a predetermined threshold(that is, if the power of the post-filter output signal of the currentsubframe is less than the average power of the signal in the stationarynoise period), average noise power calculator 126 recalculates theaverage power (i.e. average noise power) of the signal in the stationarynoise period using the post-filter output signal power. The averagenoise power is recalculated using, for example, an autoregressive modelsmoothing algorithm. In this case, by adding control of moderating thesmoothing if the power ratio decreases (so as to make the post-filteroutput signal power of the current subframe emerge), it is possible todecrease the level of the average noise power promptly if the backgroundnoise level decreases rapidly in a speech period. The recalculatedaverage noise power is output to power variation calculator 123.

In the above, the LPCs, LSPs and average LSPs are parametersrepresenting the spectrum envelope component of a speech signal, whilethe adaptive code vector, noise code vector, adaptive code gain andnoise code gain are parameters representing the residual component ofthe speech signal. Parameters representing the spectrum envelopecomponent and parameters representing the residual component are notlimited to the herein-contained examples.

The steps of processing in first determiner 121, second determiner 124and stationary noise property extractor 105 are described below withreference to FIGS. 3 and 4. In FIGS. 3 and 4, ST1101 to ST1107 areprincipally performed in first stationary noise period detector 103,ST1108 to ST1117 are principally performed in second stationary noiseperiod detector 104, and ST1118 to ST1120 are principally performed instationary noise property extractor 105.

In ST1101, the LSPs of the current subframe are calculated and smoothedaccording to equation 1 given earlier. In ST1102, the difference (thatis, the amount of variation) between the LSPs of the current subframeand the LSPs of the immediately preceding subframe is calculated. ST1101and ST1102 are performed in inter-subframe variation calculator 119described earlier.

An example of the method of calculating the amount of inter-subframe LSPvariation in variation calculator 119 is shown in equation 1′, equation2 and equation 3. Equation 1′ smoothes the LSPs of the current subframe,equation 2 provides the difference of the smoothed LSPs betweensubframes in a square sum, and equation 3 further smoothes the sum ofthe squares of the LSP differences between subframes.L′i(t)=0.7×Li(t)+0.3×L′i(t−1)  (Equation 1′)

$\begin{matrix}{{{DL}(t)} = {\sum\limits_{i = 1}^{p}\;\{ \lbrack {{L^{\prime}{i(t)}} - {L^{\prime}{i( {t - 1} )}}} \rbrack^{2} \}}} & ( {{Equation}\mspace{14mu} 2} )\end{matrix}$DL′(t)=0.1×DL(t)+0.9×DL′(t−1)  (Equation 3)

In these equations, L′i(t) represents the smoothed LSP parameter of thei-th order in the t-th subframe, Li(t) represents the LSP parameter ofthe i-th order in the t-th subframe, DL(t) represents the amount of LSPvariation in the t-th subframe (i.e. the sum of the squares of LSPdifferences between subframes), DL′(t) represents a smoothed version ofthe amount of LSP variation in the t-th subframe (i.e. a smoothedversion of the sum of the squares of LSP differences between subframes),and p represents the LSP (LPC) analysis order. In this example, DL′(t)is calculated in inter-subframe variation calculator 119 using equation11, equation 2 and equation 3, and then used in mode determination asthe amount of inter-subframe LSP variation.

In ST1103, distance calculator 120 calculates the distance between theLSPs of the current subframe and the average LSPs in earlier noiseperiods. Equation 4 and equation 5 show an example of the distancecalculation in distance calculator 120.

$\begin{matrix}{{D(t)} = {\sum\limits_{i = 1}^{p}\;\{ \lbrack {{{Li}(t)} - {LNi}} \rbrack^{2} \}}} & ( {{Equation}\mspace{11mu} 4} )\end{matrix}$DX(t)=Max{[Li(t)−LNi] ²} i=1 , , , p  (Equation 5)

Equation 4 defines the distance between the average LSPs in earliernoise periods and the LSPs in the current subframe by the sum of thesquares of the differences in all orders. Equation 5 defines thedistance by the square of the difference in one order whose differenceis the largest among all orders. LNi represents the average LSPs inearlier noise periods and updated on a per subframe basis in a noiseperiod, using, for example, equation 6.LNi=0.95×LNi+0.05×Li(t)  (Equation 6)

In this example, D(t) and DX(t) are determined in distance calculator120 using equation 4, equation 5 and equation 6, and then used in modedetermination as information representing the distance from the LSPs inthe stationary noise period.

In ST1104, power variation calculator 123 calculates the power of thepost-filter output signal (i.e. the output signal from post filter 118).This power calculation is performed in power variation calculator 123described earlier, using equation 7, for example.

$\begin{matrix}{P = \sqrt{\{ {\sum\limits_{i = 0}^{N}\lbrack {{S(i)} \times {S(i)}} \rbrack} \}}} & ( {{Equation}\mspace{14mu} 7} )\end{matrix}$In equation 7, S(i) is the post-filter output signal, and N is thelength of the subframe. The power calculation in ST1104 is performed inpower variation calculator 123 provided in second stationary noiseperiod detector 104 as shown in FIG. 1. This power calculation needs tobe performed before ST1108 but is not limited to ST1104.

In ST1105, the stationary noise properties of the decoded signal areevaluated. To be more specific, it is determined whether both of theamount of LSP variation calculated in ST 1102 and the distancecalculated in ST 1103 are small. Thresholds are set for the amount ofLSP variation calculated in ST1102 and the distance calculated inST1103. If the amount of LSP variation calculated in ST1102 is below thethreshold and the distance calculated in ST1103 is below the threshold,the stationary noise properties are high and the flow proceeds toST1107. For example, with respect to DL′, D and DX described earlier, ifthe LSPs are normalized in the range between 0.0 and 1.0, using thefollowing thresholds improves the reliability of the abovedetermination.

Threshold for DL: 0.0004

Threshold for D: 0.003+D∝

Threshold for DX: 0.0015

D′ is the average value of D in the noise period, and calculated asshown in equation 8 in the noise period.D′=0.05×D(t)+0.95×D′  (Equation 8)

LNi is the average LSPs in earlier noise period yet has an reliablevalue only when a sufficient number of noise periods are available forsampling (e.g. 20 subframes), D and DX are not used in the evaluation ofstationary noise properties in ST1005 if the previous noise period isless than a predetermined time length (e.g. 20 subframes).

In ST1107, the current subframe is determined as a stationary noiseperiod, and the flow proceeds to ST1108. Meanwhile, if either the amountof LSP variation calculated in ST1102 or the LSP distance calculated inST1103 is greater than the threshold, the current subframe is determinedto have low stationary properties, and the flow shifts to ST1106. InST1106, it is determined that the subframe does not represent astationary noise period (in other words, the subframe is determined torepresent a speech period), and the flow proceeds to ST1110.

In ST1108, it is determined whether the power of the current subframe isgreater than the average power of earlier stationary noise periods.Specifically, a threshold for the output of power variation calculator123 (the ratio of the power of the post-filter output signal to theaverage power of the stationary noise period) is set, and, if the ratioof the power of the post-filter output signal to the average power ofthe stationary noise period is greater than the threshold, the flowproceeds to ST1109. In ST1109, the current subframe is determined torepresent a speech period.

For example, using 2.0 for this threshold improves the reliability ofthe above determination. If the power P of the post-filter output signalcalculated using equation 7 is greater than twice the average power PN′of the stationary noise period, the flow proceeds to ST1109. The averagepower PN′ is updated on a per subframe basis in the stationary noiseperiod using equation 9, for example.PN′=0.9×PN′+0.1×P  (Equation 9)If the amount of power variation is less than the threshold, the flowproceeds to ST1112. In this case, the determination result in ST1107 ismaintained and the current subframe is determined to represent astationary noise period.

Next, in ST1110, it is checked how long the stationary state has lastedand whether the stationary state is a stationary voiced speech state.Then, if the current subframe does not represent a stationary voicedspeech state and the stationary state has lasted a predetermined time,the flow proceeds to ST1111, and, in ST1111, the current subframe isdetermined to represent a stationary noise period.

Specifically, whether the current subframe is in a stationary state isdetermined using the output from inter-subframe variation calculator 119(i.e. the amount of inter-subframe variation). In other words, if theinter-subframe variation amount from ST1102 is small (i.e. less than apredetermined threshold), the current subframe is determined torepresent a stationary state. The same threshold as in ST1105 may beused. Thus, if the current subframe is determined to represent astationary noise state, it is checked how long this state has lasted.

Whether the current subframe represents a stationary voiced speech stateis determined based on information showing whether the current subframerepresents a stationary voiced speech, provided from stationary noiseperiod detecting apparatus 102. For example, if transmitted codeinformation contains the above information as mode information, whetherthe current subframe represents a stationary voiced speech state isdetermined using the decoded mode information. Otherwise, a sectionprovided in stationary noise period detecting apparatus 102 to evaluatevoiced stationary properties, may output the above information, and,using this information, determines whether the current subframerepresents a stationary voiced speech state.

If, as a result of the check, the stationary state has lasted apredetermined time (e.g. 20 subframes or longer) and the currentsubframe does not represent a stationary voiced speech state, thecurrent subframe is determined to represent a stationary noise period inST1111, even if in ST1108 the power variation is determined to be large,and then the flow proceeds to ST1112. On the other hand, if ST1110yields a negative result (that is, if the current subframe represents avoiced stationary period or if a stationary state has not lasted apredetermined time), it is kept to determine that the current subframerepresents a speech period, and the flow proceeds to ST1114.

Next, if the current subframe is determined to represent a stationarynoise period up till this point, whether the periodicity of the decodedsignal is high is determined in ST1112. To be more specific, based onthe adaptive code gain from speech decoding apparatus 101 (that is, fromgain codebook 112) and the pitch history analysis result from pitchhistory analyzer 122, second determiner 124 evaluates the periodicity ofthe decoded signal in the current subframe. In this case, the adaptivecode gain is preferably subjected to processing of autoregressive modelsmoothing so as to smooth the variations between subframes.

In this periodicity evaluation, for example, a threshold for theadaptive code gain after smoothing processing (i.e. the smoothedadaptive code gain) is set, and, if the smoothed adaptive code gain isgreater than the predetermined threshold, the periodicity is determinedto be high, and the flow proceeds to ST1113. In ST1113, the currentsubframe is determined to represent a speech period.

Further, if the number of groups into which the pitch periods of earliersubframes are classified is small in the pitch history analysis result,periodic signals are likely to be continuing. Therefore the periodicityis evaluated based on this number of groups. For example, if the pitchperiods of the past ten subframes are classified into three or fewergroups, it is likely that periodic signals are continuing in the currentperiod, and the flow shifts to ST1113, and, in ST 1113, the currentsubframe is determined to represent a speech period, not a stationarynoise period.

If ST1112 yields a negative result (that is, if the smoothed adaptivecode gain is less than the predetermined threshold and the number ofgroups into which the pitch periods of earlier subframes are classifiedis small in the pitch history analysis result), it is kept to determinethat the current subframe represents a stationary noise period, and theflow proceeds to ST1115.

If a determination result showing a speech period is provided up tillthis point, the flow proceeds to ST1114, and a predetermined number ofhangover subframes (e.g. 10) is set on the hangover counter. The numberof hangover frames is set on the hangover counter for the initial value,which is then decremented by 1 every time a stationary noise period isidentified through ST1101 to ST1113. If the hangover counter shows “0”,the current subframe is definitively determined to represent astationary noise period.

If a determination result showing a stationary noise period is providedup till point, the flow shifts to ST1115, and it is checked whether thehangover counter is within a hangover range (i.e. the range between 1and the number of hangover frames). In other words, whether the hangovercounter shows “0” is checked. If the hangover counter is within theabove-noted hangover range, the flow proceeds to ST1116. In ST1116, thecurrent subframe is determined to represent a speech period, and,following this, in ST1117, the hangover counter is decremented by 1. Ifthe counter is not in the hangover range (that is, when the countershows “0”), the result is kept to determine that the current subframerepresents a stationary noise period, and the flow proceeds to ST1118.

If the determination result shows a stationary noise period, average LSPcalculator 125 updates the average LSPs in the stationary noise periodin ST1118. This updating is performed using, for example, equation 6, ifthe determination result shows a stationary noise period. Otherwise, theprevious value is maintained without updating. In addition, if the timedetermined earlier to represent a stationary noise period is short, thesmoothing coefficient, 0.95, in equation 6 may be made less.

In ST1119, average noise power calculator 126 updates the average noisepower. The updating is performed, for example, using equation 9, if thedetermination result shows a stationary noise period. Otherwise, theprevious value is maintained without updating. However, even if thedetermination result does not show a stationary noise period, if thepower of the current post-filter output signal is below the averagenoise power, the average noise power is updated using equation 9, inwhich the smoothing coefficient 0.9 is replaced with a smaller value, soas to decrease the average noise power. By this means, it is possible toaccommodate cases where the background noise level suddenly decreasesduring a speech period.

Finally, in ST1120, second determiner 124 outputs the determinationresult, average LSP calculator 125 outputs the updated average LSPs, andaverage noise power calculator 126 outputs the updated average noisepower.

As described above, according to this embodiment, if it is determinedthat a subframe represents a stationary noise period according to theevaluation of stationary properties using the LSPs, the degree of theperiodicity of the subframe is evaluated using the adaptive code gainand the pitch period, and, based on this degree of periodicity, it ischecked again whether the subframe represents a stationary noise period.Accordingly, it is possible to correctly identify signals that arestationary yet not noisy such as sine waves and stationary vowels.

Second Embodiment

FIG. 5 illustrates the configuration of a stationary noisepost-processing apparatus according to the second embodiment of thepresent invention. In FIG. 5, the same parts as in FIG. 1 are assignedthe same reference numerals as in FIG. 1, and specific descriptionsthereof are omitted.

A stationary noise post-processing apparatus 200 is comprised of a noisegenerator 201, adder 202 and scaling section 203. In stationary noisepost-processing apparatus 200, adder 202 adds a pseudo stationary noisesignal generated in noise generator 201 and the post-filter outputsignal from speech decoding apparatus 101, scaling section 203 adjuststhe power of the post-filter output signal after the addition byperforming scaling processing, and the resulting post-filter outputsignal becomes outputs of stationary noise post-processing apparatus200.

Noise generator 201 is comprised of an excitation generator 210,synthesis filter 211, LSP/LPC converter 212, multiplier 213, multiplier214 and gain adjuster 215. Scaling section 203 is comprised of a scalingcoefficient calculator 216, inter-subframe smoother 217, inter-samplesmoother 218 and multiplier 219.

The operation of stationary noise post-processing apparatus 200 of theabove-mentioned configuration will be described below.

Excitation generator 210 selects a fixed code vector at random fromfixed codebook 113 provided in speech decoding apparatus 101, and, basedon the selected fixed code vector, generates a noise excitation signaland outputs this signal to synthesis filter 211. The noise excitationsignal needs not to be generated based on a fixed code vector selectedfrom fixed codebook 113 provided in speech decoding apparatus 101, andan optimal method may be chosen for system by system in view of thecomputational complexity, memory requirements, the properties of thenoise signal to be generated, etc. Generally, using a fixed code vectorselected from fixed codebook 113 provided in speech decoding apparatus101 proves effective. LSP/LPC converter 212 converts the average LSPsfrom average LSP calculator 125 into an LPCs and outputs the LPCs tosynthesis filter 211.

Synthesis filter 211 configures an LPC synthesis filter using the LPCsfrom LSP/LPC converter 212. Synthesis filter 211 performs filteringprocessing using the noise excitation signal from excitation generator210 and synthesizes the noise signal, and outputs the synthesized noisesignal to multiplier 213 and gain adjuster 215.

Gain adjuster 215 calculates the gain adjustment coefficient foradjusting the power of the output signal of synthesis filter 211 to theaverage noise power from average noise power calculator 126. The gainadjustment coefficient is subjected to smoothing processing forrealizing a smooth continuity between subframes and furthermoresubjected to smoothing processing on a per sample basis for realizing asmooth continuity in each subframe. Finally, the gain adjustmentcoefficient is output to multiplier 213 for each sample. Specifically,the gain adjustment coefficient is obtained according to equation 10,equation 11 and equation 12.Psn′=0.9×Psn′+0.1×Psn  (Equation 10)Scl=PN′/Psn′  (Equation 11)Scl′=0.85×Scl′+0.15×Scl  (Equation 12)In these equations, Psn is the power of the noise signal synthesized bysynthesis filter 211 (calculated as shown in equation 7), and Psn′ is aversion of Psn smoothed between subframes and updated using equation 10.PN′ is the power of the stationary noise signal given by equation 9, andScl is the scaling coefficient in the processing frame. Scl′ is the gainadjustment coefficient, employed on a per sample basis, and updated on aper sample basis using equation 12.

Multiplier 213 multiplies the gain adjustment coefficient from gainadjuster 215 with the noise signal from synthesis filter 211. The gainadjustment coefficient may vary for each sample. The multiplicationresult is output to multiplier 214.

In order to adjust the absolute level of the noise signal to begenerated, multiplier 214 multiplies the output signal from multiplier213 with a predetermined constant (e.g. about 0.5). Multiplier 214 maybe incorporated in multiplier 213. The level-adjusted signal (i.e.stationary noise signal) is output to adder 202. In the above-describedway, a stationary noise signal maintaining a smooth continuity isgenerated.

Adder 202 adds the stationary noise signal generated in noise generator201 and the post-filter output signal from speech decoding apparatus 101(more specifically, post filter 118), and adder 202 outputs the resultto scaling section 203 (more specifically, to scaling coefficientcalculator 216 and multiplier 219).

Scaling coefficient calculator 216 calculates both the power of thepost-filter output signal from speech decoding apparatus 101 (morespecifically, post filter 118) and the power of the post-filter outputsignal from adder 202 after the addition with the stationary noisesignal, and by calculating the ratio between these powers, scalingcoefficient calculator 216 calculates a scaling coefficient thatminimizes the signal power difference between the decoded signal (towhich stationary noise is not added yet) and a scaled signal. Andscaling coefficient calculator 216 outputs the calculated coefficient tointer-subframe smoother 217. Specifically, the scaling coefficient“SCALE” is determined as shown in equation 13.SCALE=P/P′  (Equation 13)P is the power of the post-filter output signal, calculated in equation7, and P′ is the power of the sum signal of the post-filter outputsignal and the stationary noise signal, calculated by the same equationas for P.

Inter-subframe smoother 217 performs inter-subframe smoothing processingof the scaling coefficient between subframes so that the scalingcoefficient varies moderately between subframes. This smoothing is notperformed (or is performed very weakly) during the speech period, toavoid smoothing the power of the speech signal itself and making theresponsivity to power variation poor. Whether the current subframerepresents a speech period is determined based on the determinationresult from second determiner 124 shown in FIG. 1. The smoothed scalingcoefficient is output to inter-sample smoother 218. The smoothed scalingcoefficient SCALE′ is updated by equation 14.SCALE′=0.9×SCALE′+0.1×SCALE  (Equation 14)

Inter-sample smoother 218 performs the smoothing processing of thescaling coefficient between samples so that the scaling coefficientvaries moderately between samples. This smoothing may be performed inautoregressive model smoothing processing. Specifically, the smoothedcoefficient “SCALE″” per sample is updated by equation 15.SCALE″=0.85×SCALE″+0.15×SCALE′  (Equation 15)

In this way, the scaling coefficient is smoothed between samples andmade to vary little by littler per sample, so that it is possible toprevent the scaling coefficient from being discontinues across or nearframe boundaries. The scaling coefficient is calculated for each sampleand output to multiplier 219.

Multiplier 219 multiplies the scaling coefficient from inter-samplesmoother 218 with the post-filter output signal from adder 202 to whichwith a stationary noise signal is added, and outputs the result as afinal output signal.

In the above configuration, the average noise power from average noisepower calculator 126, the LPCs from LSP/LPC converter 212 and thescaling coefficient from scaling calculator 216 are parameters used inpost-processing.

Thus, according to this embodiment, noise is generated in noisegenerator 201 and added to the decoded signal (i.e. post-filter outputsignal), and then scaling section 203 performs the scaling of thedecoded signal. In this way, the decoded signal with noise is subjectedto scaling so that the power of the decoded signal with adding noise isclose to the power of the decoded signal without adding noise. Further,the present embodiment utilizes both inter-frame smoothing andinter-sample smoothing, so that stationary noise becomes smoother,thereby improving the subjective quality of stationary noise.

Third Embodiment

FIG. 6 illustrates a configuration of a stationary noise post-processingapparatus according to the third embodiment of the present invention. InFIG. 6, the same parts as in FIG. 5 are assigned the same referencenumerals as in FIG. 5, and specific descriptions thereof are omitted.

In addition to the configuration of stationary noise post-processingapparatus 200 shown in FIG. 2, the apparatus in this embodiment furthercomprises memories for storing parameters required in noise signalgeneration and scaling upon frame erasure, a frame erasure concealmentprocessing controller for controlling the memories, and switches used inframe erasure concealment processing.

A stationary noise post-processing apparatus 300 is comprised of a noisegenerator 301, adder 202, scaling section 303 and frame losscompensation processing controller 304.

Noise generator 301 has a configuration that adds to the configurationof noise generator 201 shown in FIG. 5, memories 310 and 311 for storingparameters required in noise signal generation and scaling upon frameerasure, and switches 313 and 314 that close and open during frameerasure concealment processing. Scaling section 303 is comprised of amemory 312 that stores parameters required in noise signal generationand scaling upon frame erasure and a switch 315 that closes and opensduring frame erasure concealment processing.

The operation of stationary noise post-processing apparatus 300 will bedescribed below. First, the operation of noise generator 301 will beexplained.

Memory 310 stores the power (i.e. average noise power) of a stationarynoise signal from average noise power calculator 126 via a switch 313,and outputs this to gain adjustor 215.

Switch 313 opens and closes in accordance with control signals from aframe loss compensation processing controller 304. Specifically, switch313 opens when a control signal for performing frame erasure concealmentprocessing is received as input, and stays closed otherwise. When switch313 opens, memory 310 is in the state of storing the power of thestationary noise signal in the immediately preceding subframe andprovides that power to gain adjustor 215 on demand until switch 313closes again.

Memory 311 stores the LPCs of the stationary noise signal from LSP/LPCconverter 212 via switch 314, and outputs this to synthesis filter 211.

Switch 314 opens and closes in accordance with control signals fromframe erasure concealment processing controller 304. Specifically,switch 314 opens when a control signal for performing frame erasureconcealment processing is received as input, and stays closed otherwise.When switch 314 opens, memory 311 is in the state of storing the LPC ofthe stationary noise signal in the immediately preceding subframe andprovides that LPCs to synthesis filter 211 on demand until switch 314closes again.

The operation of scaling section 303 will be described below.

Memory 312 stores the scaling coefficient that is calculated in scalingcoefficient calculator 216 and output via a switch 315, and Memory 312outputs this to inter-subframe smoother 217.

Switch 315 opens and closes in accordance with control signals fromframe erasure concealment processing controller 304. Specifically,switch 315 opens when a control signal for performing frame erasureconcealment processing is received as input, and stays closed otherwise.When switch 315 opens, memory 312 is in the state of storing the scalingcoefficient in the preceding subframe and provides that scalingcoefficient to inter-subframe smoother 217 on demand until switch 315closes again.

Frame erasure concealment processing controller 304 receives, as input,a frame erasure indication obtained by error detection etc and outputs acontrol signal to switches 313 to 315. The control signal is used forperforming frame erasure concealment processing during subframes in thelost frame and the next recovered subframes after the lost frame(error-recovered subframe(s)). This frame erasure concealment processingfor the error-recovered subframe may be performed for a plurality ofsubframes (e.g. two subframes). The frame erasure concealment processingrefers to the processing of interpolating the parameters and controllingthe audio volume using frame information from earlier than the lostframe, so as to prevent the quality of the decoded signal fromdeteriorating significantly due to loss of part of the subframes. Inaddition, if significant power change does not occur in theerror-recovered subframe following the lost frame, the frame erasureconcealment processing in the error-recovered subframe is not necessary.

With a general frame erasure concealment method, the current frame isextrapolated using earlier information. Extrapolated data causesdeterioration of subjective quality, and so the signal power isattenuated gradually. However, if frame erasure occurs in a stationarynoise period, the deterioration in subjective quality due to break inaudio, which is caused by the attenuation of power, is often greaterthan the deterioration in subjective quality due to the distortion,which is caused by the extrapolation. In particular, in packetcommunications as typified by internet communications, sometimes framesare lost consecutively, and the deterioration due to break in audiobecomes significant. To avoid this, with the stationary noisepost-processing apparatus according to the present invention, gainadjustor 215 calculates the gain adjustment coefficient for scaling inaccordance with the average noise power from average noise powercalculator 126 and multiplies this with the stationary noise signal.Furthermore, scaling coefficient calculator 216 calculates the scalingcoefficient such that the power of the stationary noise signal to whichthe post-filter output signal is added does not change significantly,and outputs the signal multiplied with this scaling coefficient, as thefinal output signal. By this means, it is possible to suppress the powervariation in the final output signal and maintain the signal level ofthe stationary noise preceding frame erasure, and consequently minimizethe deterioration in subjective quality due to breaks in audio.

Fourth Embodiment

FIG. 7 is a diagram showing a configuration of a speech decodingprocessing system according to the fourth embodiment of the presentinvention. The speech decoding processing system is comprised of codereceiving apparatus 100, speech decoding apparatus 101 and stationarynoise period detecting apparatus 102, which are explained in thedescription of the first embodiment, and stationary noisepost-processing apparatus 300, which is explained in the description ofthe third embodiment. In addition, the speech decoding processing systemmay have stationary noise post-processing apparatus 200 explained in thedescription of the second embodiment, instead of stationary noisepost-processing apparatus 300.

The operation of the speech decoding processing system will bedescribed. Descriptions of the components the system have been providedin the first to third embodiments with reference to FIG. 1, FIG. 5 andFIG. 6, and, in FIG. 7. And therefore the same parts as in FIG. 1, FIG.5 and FIG. 6 are assigned the same reference numerals as in FIG. 1, FIG.5 and FIG. 6, respectively, to omit their specific descriptions.

Code receiving apparatus 100 receives a coded signal via the channel,separates various parameters from the signal and outputs theseparameters to speech decoding apparatus 101. Speech decoding apparatus101 decodes a speech signal from the parameters, and outputs apost-filter output signal and other necessary parameters, which areobtained during the decoding processing, to stationary noise perioddetecting apparatus 102 and stationary noise post-processing apparatus300. Stationary noise period detecting apparatus 102 determines whetherthe current subframe represents a stationary noise period using theinformation from speech decoding apparatus 101, and outputs thedetermination result and other necessary parameters, which are obtainedthrough the determination processing, to stationary noisepost-processing apparatus 300.

In response to the post-filter output signal from speech decodingapparatus 101, stationary noise post-processing apparatus 300 performsthe processing of generating a stationary noise signal using variousparameter information from speech decoding apparatus 101 and thedetermination result and other parameter information from stationarynoise period detecting apparatus 102, and performs superimposing thisstationary noise signal over the post-filter output signal, and outputsthe result as the final post-filter output signal.

FIG. 8 is a flowchart showing the flow of the processing of the speechdecoding system according to this embodiment. FIG. 8 only shows the flowof processing in stationary noise period detecting apparatus 102 andstationary noise post-processing apparatus 300 shown in FIG. 7, and theprocessing in code receiving apparatus 100 and speech decoding apparatus101 are omitted because the processing therein can be implemented usinggeneral techniques. The operation of the processing subsequent to speechdecoding apparatus 101 in the system will be described below withreference to FIG. 8. First, in ST501, variables stored in the memoriesare initialized in the speech decoding system according to thisembodiment. FIG. 9 shows examples of memories to be initialized andtheir initial values.

Next, the processing of ST502 to ST505 is performed in a loop, untilspeech decoding apparatus 101 has no more post-filter output signal(that is, until speech decoding apparatus 101 stops the processing). InST502, mode determination is made, and it is determined whether thecurrent subframe represents a stationary noise period (stationary noisemode) or a speech period (speech mode). The processing in ST502 will beexplained later in detail.

In ST503, stationary noise post-processing apparatus 300 performsprocessing of adding stationary noise (stationary noise postprocessing). The flow of the stationary noise post processing in ST503will be explained later in detail. In ST504, scaling section 303performs the final scaling processing. The flow of this scalingprocessing performed in ST504 will be explained later in detail.

In ST505, it is checked whether the current subframe is the lastsubframe, to determine whether to finish or continue the loop of ST502to ST505. The loop processing is performed until speech decodingapparatus 101 has no more post-filter output signal (that is, untilspeech decoding apparatus 101 stops the processing). When processingexits from the loop, all processing of the speech decoding systemaccording to this embodiment terminates.

The flow of mode determination processing in ST502 will be describedbelow with reference to FIG. 10. First, in ST701, it is checked whetherthe current subframe is part of frame erasure.

If the current subframe is part of frame erasure, the flow proceeds toST702, in which a predetermined value (3, in this example) is set on thehangover counter for the frame erasure concealment processing, and thento ST704. When frame erasure occurs, frame erasure concealmentprocessing is still performed on some of the next subframes after theframe erasure even if these subframes are correctly received (no frameerasure occurs, yet those subframes are still subjected to frame erasureconcealment processing), and the number of these subframes correspondsto the predetermined value set on the hangover counter.

If the current subframe is not part of frame erasure, the flow proceedsto ST703, where it is checked whether the value on the hangover counterfor the frame erasure concealment processing is 0. If the value on thehangover counter is not 0, the value on the hangover counter isdecremented by 1, and the flow proceeds to ST704.

In ST704, whether to perform frame erasure concealment processing isdetermined. If the current subframe is not part of frame erasure or isnot in the hangover period immediately after the frame erasure, it isdetermined not to perform frame erasure concealment processing, and theflow proceeds to ST705. If the current subframe is part of frame erasureor is in the hangover period immediately after the frame erasure, it isdetermined to perform frame erasure concealment processing, and the flowproceeds to ST707.

In ST705, the smoothed adaptive code gain is calculated and the pitchhistory analysis is performed as explained in the description of thefirst embodiment, and the same descriptions will not be repeated. Inaddition, the pitch history analysis flow has been explained withreference to FIG. 2. After these processing, the flow proceeds to ST706.In ST706, mode selection is performed. The mode selection flow is shownin detail in FIG. 3 and FIG. 4. In ST708, the average LSPs of the signalin the stationary noise period calculated in ST706 are converted intoLPCs. The processing in ST708 needs not be performed subsequent to ST706and needs only to be performed before a stationary noise signal isgenerated in ST503.

If in ST704 it is determined to perform frame erasure concealmentprocessing, in ST707, setting is made such that the mode and averageLPCs of the signal in the stationary noise period in the precedingsubframe are maintained in the current subframe, and then the flowproceeds to ST709.

In ST709, the mode information of the current subframe (informationshowing whether the current subframe represents a stationary noise modeor speech signal mode) and the average LPCs of the signal in thestationary noise period of the current subframe are copied intomemories. In addition, it is not always necessary to store informationof the current mode in memories in this embodiment. However, thisinformation needs to be kept in a memory if the mode determinationresult is used in other blocks (e.g. speech decoding apparatus 101).This concludes the description of the mode determination processing inST502.

The flow of the processing of adding stationary noise in ST503 will bedescribed below with reference to FIG. 11. First, in ST801, excitationgenerator 210 generates a random vector. Any random vector generationmethod may be employed, but, as explained in the description of thesecond embodiment, the method of random selection from fixed codebook113 provided in speech decoding apparatus 101 is effective.

In ST802, using the random vector generated in ST801 for excitation, LPCsynthesis filtering processing is performed. In ST803, the noise signalsynthesized in ST802 is subjected to band-limiting filtering processing,so that the bandwidth of the noise signal is coordinated with thebandwidth of the decoded signal from speech decoding apparatus 101. Thisprocessing is not mandatory. In ST804, the power of the synthesizednoise signal, which is subjected to band limiting processing in ST803,is calculated.

In ST805, the signal power obtained in ST804 is smoothed. The smoothingcan be implemented at ease by performing the autoregressive modelsmoothing processing shown in equation 1 between consecutive frames. Thecoefficient k for smoothing is determined depending on how smooth thestationary signal needs to be made. Preferably, relatively strongsmoothing is performed (e.g. coefficient k is between 0.05 and 0.2),using equation 10.

In ST806, the ratio of the power of the stationary noise signal to begenerated (calculated in ST1118) to the signal power, which isinter-subframe smoothed version, from ST805 is calculated as a gainadjustment coefficient, as shown in equation 11. The calculated gainadjustment coefficient is smoothed per sample, as shown in equation 12,and is multiplied with the synthesized noise signal subjected toband-limiting filtering processing in ST803. The stationary noise signalmultiplied by the gain adjustment coefficient is further multiplied by apredetermined constant (i.e. fixed gain). This multiplication with afixed gain is to adjust the absolute level of the stationary noisesignal.

In ST807, the synthesized noise signal generated in ST806 is added tothe post-filter output signal from speech decoding apparatus 101, andthe power of the post-filter output signal, which is after the addition,is calculated.

In ST808, the ratio of the power of the post-filter output signal fromspeech decoding apparatus 101 to the power calculated in ST807 iscalculated as a scaling coefficient using equation 13. The scalingcoefficient is used in the scaling processing of ST504 performed afterthe processing of adding stationary noise.

Finally, adder 202 adds the synthesized noise signal (stationary noisesignal) generated in ST806 and the post-filter output signal from speechdecoding apparatus 101. This processing may be included in ST807. Thisconcludes the description of the processing of adding stationary noisein ST503.

The flow in ST504 will be described below with reference to FIG. 12.First, in ST901, it is checked whether the current subframe is a targetsubframe for frame erasure concealment processing. If the currentsubframe is a target subframe for frame erasure concealment processing,the flow proceeds to ST902. If the current subframe is not a targetsubframe, the flow proceeds to ST903.

In ST902, frame erasure concealment processing is performed. That is,setting is made such that the scaling coefficient from the immediatelypreceding subframe is maintained in the current subframe, and then theflow proceeds to ST903.

In ST903, using the determination result from stationary noise perioddetecting apparatus 102, it is checked whether the current mode is thestationary noise mode. If the current mode is the stationary noise mode,the flow proceeds to ST904. If the current mode is not the stationarynoise mode, the flow proceeds to ST905.

In ST904, the scaling coefficient is subjected to inter-subframesmoothing processing, using equation 1. In this case, the value of k isset at about 0.1. To be more specific, equation 14 is used, for example.The processing is performed to smooth the power variations betweensubframes in the stationary noise period. After the smoothing, the flowproceeds to ST905.

In ST905, the scaling coefficient is smoothed per sample, and thesmoothed scaling coefficient is multiplied by the post-filter outputsignal to which the stationary noise generated in ST502 is added. Thesmoothing is performed per sample using equation 1, and, in this case,the value of k is set at about 0.15. To be more specific, equation 15 isused, for example. This concludes the description of the scalingprocessing in ST504. The post-filter output signal is scaled and addedstationary noise.

The equations for smoothing and average value calculation are by nomeans limited to the equations provided herein, and the equation forsmoothing may utilize the average value from certain earlier periods.

The present invention is not limited to the above-mentioned first tofourth embodiments and may be carried into practice in various otherforms. For example, the stationary noise period detecting apparatus ofthe present invention is applicable to any decoder.

Furthermore, although cases have been described with the aboveembodiments where the present invention is implemented as a speechdecoding apparatus, the present invention is by no means limited tothis, and, for example, an equivalent speech decoding method may beimplemented in software. For instance, a program for executing thespeech decoding method may be stored in a ROM (Read Only Memory) andexecuted by a CPU (Central Processor Unit). It is equally possible tostore a program for executing the speech decoding method in a computerreadable storage medium, store this storage medium in a RAM (RandomAccess Memory), and operate the program on a computer.

In view of the herein-contained descriptions of embodiments, the presentinvention evaluates the degree of periodicity of a decoded signal usingthe adaptive code gain and pitch period, and, based on the degree ofperiodicity, determines whether a subframe represents a stationary noiseperiod. Accordingly, if a signal arrives that is stationary but is notnoisy (e.g. a sine wave or a stationary vowel), it is still possible tocorrectly determine the state of the signal.

This application is based on Japanese Patent Application No.2000-366342, filed on Nov. 30, 2000, the entire content of which isexpressly incorporated by reference herein.

INDUSTRIAL APPLICABILITY

The present invention is suitable for use in mobile communicationsystems and in packet communication systems, including internetcommunications systems and speech decoding apparatuses.

1. A stationary noise period detecting apparatus comprising: a pitchhistory analyzer that classifies pitch periods of a plurality of pastsubframes into one or more classes in a way in which different pitchperiods are classified to different classes, groups classes where adifference between the pitch periods classified to those classes is lessthan a predetermined first threshold into one group when there are aplurality of classes, and obtains a number of the groups as an analysisresult; and a determiner that determines that a signal period where theanalysis result is less than a predetermined second threshold is aspeech period.
 2. The stationary noise period detecting apparatusaccording to claim 1, further comprising: an average LSP calculator thatcalculates an average of LSP vectors of a signal of a stationary noiseperiod; a distance calculator that calculates a distance between an LSPvector in a current subframe and the average LSP calculated by theaverage LSP calculator; and a tentative determiner that tentativelydetermines that a period where a fluctuation amount of an LSP vectorbetween subframes is less than a predetermined third threshold and thedistance calculated by the distance calculator is less than apredetermined fourth threshold, is a stationary noise period, wherein:the determiner performs determination processing only when the tentativedeterminer determines that a period is a stationary noise period.
 3. Thestationary noise period detecting apparatus according to claim 2,further comprising: a smoother that smoothes adaptive codebook gainsbetween subframes; and a signal power calculator that calculates signalpower of the stationary noise period determined by the tentativedeterminer, wherein: the determiner determines that a signal periodwhere the analysis result is greater than the second threshold, thesmoothed adaptive codebook gains are less than a predetermined fifththreshold, and the signal power calculated by the signal powercalculator is less than a value obtained by multiplying average power ofa background noise signal by a predetermined value, is a stationarynoise period.
 4. A stationary noise period detection method comprising:a pitch history analyzing step of classifying pitch periods of aplurality of past subframes into one or more classes in a way in whichdifferent pitch periods are classified to different classes, groupingclasses where a difference between the pitch periods classified to thoseclasses is less than a predetermined first threshold into one group whenthere are a plurality of classes, and obtaining a number of the groupsas an analysis result; and a determining step of determining that asignal period where the analysis result is less than a predeterminedsecond threshold is a speech period.
 5. The stationary noise perioddetection method according to claim 4, further comprising: an averageLSP calculating step of calculating an average of LSP vectors of asignal of a stationary noise period; a distance calculating step ofcalculating a distance between an LSP vector in a current subframe andthe average LSP calculated by the average LSP calculator; and atentative determining step of tentatively determining that a periodwhere a fluctuation amount of an LSP vector between subframes is lessthan a predetermined third threshold and the distance calculated by thedistance calculator is less than a predetermined fourth threshold, is astationary noise period, wherein in the determining step, determinationprocessing is performed only when a period is determined to be astationary noise period in the tentative determining step.
 6. Thestationary noise period detection method according to claim 5, furthercomprising: a smoothing step of smoothing adaptive codebook gainsbetween subframes; and a signal power calculating step of calculatingsignal power of the stationary noise period determined in thedetermining step, wherein: in the determining step, a signal periodwhere the analysis result is greater than the second threshold, thesmoothed adaptive codebook gains are less than a predetermined fifththreshold, and the signal power calculated in the signal powercalculating step is less than a value obtained by multiplying averagepower of a background noise signal by a predetermined value, isdetermined to be a stationary noise period.